152-2008: Coming to a Theater Near You! Sentiment Classification Techniques Using SAS® Text Miner
نویسنده
چکیده
Many Web sites, including blogs, online stores, and some database Web sites, give users the ability to state their opinions about the products they buy and use. This is a serious concern for many companies because of the potential damage of reputation. Companies want to monitor and gauge customers’ opinions, but they do not have the time or resources to manually gather statistics on this ever-expanding content source. Can companies reliably predict whether a customer comment is positive or negative without actually reading each posted comment? In this paper, we demonstrate how different techniques for classifying movie reviews can be implemented in SAS Enterprise MinerTM and SAS Text Miner. Movie reviews are classified not by category, but by sentiment. INTRODUCTION A large amount of textual content is subjective and reflects opinions. With the rapid growth of the Web, more people write online reviews for all types of products and services. It is becoming a common practice for a consumer to learn why others like or dislike a product before he or she buys it, or for a manufacturer to track customer opinions on its products to improve customer satisfaction. However, as the number of reviews for a product grows, it becomes harder to understand and evaluate customer opinions about a specific product. Sentiment classification, also referred to as polarity, tone, or opinion analysis, can track changes in attitudes toward a brand or product, compare the attitudes of the public between one brand or product and another, and extract examples of types of positive or negative opinions. In this paper, we explore several techniques for improving sentiment classification with SAS Text Miner. Standard techniques in SAS Text Miner such as weightings, synonyms, and part-of-speech tagging are presented, along with more complex extraction and feature manipulation techniques. In the next section, we provide the background on sentiment classification approaches, a description of the data set that we use for our experiments, and some technical information necessary to understand our techniques. In the next sections, we present several methods for improving the effectiveness of sentiment classification: 1. Baseline SAS Text Miner – running SAS Text Miner with the default settings 2. Baseline SAS Text Miner with MI – running SAS Text Miner with the default settings and using mutual information (MI) for weighting terms 3. Sentence Filtering – building a model using relevant sentences and removing noisy sentences 4. Synonym Lists – assisting the prediction with a knowledge base of positive and negative terms 5. Sentiment Reversal of Terms– manipulating features to track the occurrence of “not” and “n’t” in text Following these sections are experimental results applied to the data set described in the background section. Then, we present our conclusions. BACKGROUND SENTIMENT CLASSIFICATION APPROACHES There are two basic approaches to sentiment classification. The first approach is to use a model that is based on the frequency of occurrence of each term in each document. Documents are represented as vectors of term frequencies, and the information about the sequence in which the terms appear is disregarded. The terms themselves become features. Techniques such as stop lists, SVD, and term weightings are used to select and enhance the features. This statistical-based approach is at the heart of SAS Text Miner and has significant advantages because the algorithms can learn the important characteristics from the collection as a whole. Surprising and unexpected terms might actually prove useful for prediction accuracy. Unfortunately, while this bag-of-words approach can be effective for classifying documents according to a topic or category, it is often less successful in classifying sentiment. Data Mining and Predictive Modeling SAS Global Forum 2008
منابع مشابه
Feature-based Sentiment Analysis on Android App Reviews Using SAS® Text Miner and SAS® Sentiment Analysis Studio
Sentiment analysis is a popular technique for summarizing and analyzing consumers’ textual reviews about products and services. There are two major approaches for performing sentiment analysis; statistical model based approaches and Natural Language Processing (NLP) based approaches to create rules. In this study, we first apply text mining to summarize users’ reviews of Android Apps and extrac...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کامل100-2013: Relate, Retain, and Remodel: Creating and Using Context-Sensitive Linguistic Features in Text Mining Models
Text mining models routinely represent each document with a vector of weighted term frequencies. This bag-of-words approach has many strengths, one of which is representing the document in a compact form that can be used by standard data mining tools. However, this approach loses most of the contextual information that is conveyed in the relationship of terms from the original document. This pa...
متن کاملTemporal Text Mining: A Thematic Exploration of Don Quixote
Temporal text mining (TTM) is the discovery of temporal patterns in documents that are collected over time. It involves discovery of latent themes, construction of a thematic evolution graph, and analysis of thematic patterns. This paper uses text mining and time series analysis techniques to explore Don Quixote de la Mancha, a two-volume master work of Western literature. First, it uses singul...
متن کاملProcessing and Storing Sparse Data in SAS Using SAS Text Miner Procedures
Sparse data sets are common in applications of text and data mining, social network analysis, and recommendation systems. In SAS software, sparse data sets are usually stored in the coordinate list (COO) transactional format. Two major drawbacks are associated with this sparse data representation: First, most SAS procedures are designed to handle dense data and cannot consume data that are stor...
متن کامل